Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm 6.1 libraries #35998

Draft
wants to merge 45 commits into
base: master
Choose a base branch
from
Draft

ROCm 6.1 libraries #35998

wants to merge 45 commits into from

Conversation

AngryLoki
Copy link
Contributor

@AngryLoki AngryLoki commented Mar 30, 2024

This PR fixes multiple issues in previous 6.1 ROCm packages (critical ones, which caused these packages to be masked in 563b5ab).

Major things to note:

  • This release adds new targets gfx940/gfx941/gfx942 to profiles/desc
  • Update in dev-libs/half relocates headers from /usr/include to /usr/include/half. However transition is smooth, as all current packages expect to see half.hpp in /usr/include/half in the first place.
  • In dev-util/rocm-smi soname is now librocm_smi64.so.6 (was librocm_smi64.so.1 as in Debian). Now it matches official/Fedora scheme.
  • This release was tested on LLVM 17. Meanwhile AMD team used their LLVM fork between 17 and 18 with custom patches. Technically, all 6.0.2 packages can be compiled with LLVM 18, but I did not enable it due to [AMDGPU] With Clang>17, -amdgpu-early-inline-all=true consumes 8x more memory llvm/llvm-project#86332.

New packages added to gradually close gap with official release:

  • dev-libs/hipother - ROCclr runtime implementation for non-AMD HIP platforms, like NVIDIA
  • sci-libs/rpp - AMD ROCm Performance Primitives (RPP) high-performance computer vision library
  • dev-libs/rocdbgapi - AMD Debugger API - dev-debug/gdb-14.2 can debug AMDGPU code, if you build/link it with rocdbgapi.

Not added:

  • hipBLASLt - 6.0.2 has multiple compilation issues, which were fixed in master branch. It will be needed in caffe2 later (unless Optionally use hipblaslt pytorch/pytorch#120551 is merged), but there is still some time to wait for the next release

Closes: https://bugs.gentoo.org/927274

@gentoo-bot
Copy link

Pull Request assignment

Submitter: @AngryLoki
Areas affected: ebuilds, eclasses, profiles
Packages affected: dev-build/rocm-cmake, dev-libs/half, dev-libs/hipother, dev-libs/rccl, dev-libs/rocdbgapi...

@gentoo/github: Too many disjoint maintainers, disabling auto-assignment.

Linked bugs

Bugs linked: 927274


In order to force reassignment and/or bug reference scan, please append [please reassign] to the pull request title.

Docs: Code of ConductCopyright policy (expl.) ● DevmanualGitHub PRsProxy-maint guide

@gentoo-bot gentoo-bot added new package The PR is adding a new package. need assignment It was impossible to assign the PR correctly. Please assign it manually. bug linked Bug/Closes found in footer, and cross-linked with the PR. labels Mar 30, 2024
@AngryLoki
Copy link
Contributor Author

Also I would be grateful if you could merge these PRs:

so I could resolve git conflicts (if any) and apply test suite patches from there.

@gentoo-repo-qa-bot
Copy link
Collaborator

Pull request CI report

Report generated at: 2024-03-30 10:52 UTC
Newest commit scanned: febb857
Status: ✅ good

There are existing issues already. Please look into the report to make sure none of them affect the packages in question:
https://qa-reports.gentoo.org/output/gentoo-ci/2ced917aff/output.html

@AngryLoki AngryLoki marked this pull request as draft May 6, 2024 09:27
@heroxbd heroxbd self-assigned this May 18, 2024
@heroxbd heroxbd removed the need assignment It was impossible to assign the PR correctly. Please assign it manually. label May 18, 2024
@heroxbd heroxbd self-requested a review May 18, 2024 03:51
@littlewu2508
Copy link
Contributor

rocm.eclass for ROCm-6 is ready now: #36254

@AngryLoki AngryLoki changed the title ROCm 6.0.2 ROCm 6.1 libraries May 19, 2024
Previous idea to install header to /usr/include was unfortunetely not very good.
As Gentoo ships version from ROCm/half, correct place should be /usr/include/half
per https://github.com/ROCm/half/blob/rocm-6.0.2/CMakeLists.txt#L27

When half.hpp is installed directly into /usr/include, it causes issues with every ROCm component,
including MIOpen, MIVisionX, AMDMIGraphX, rpp, MIFin, rocAL. These projects as well as some other
non-ROCm projects include <half/half.hpp>.

This change is added with ebuild revbump, with few followup commits:
* sci-libs/composable-kernel-5.7.1-r1 will drop dev-libs/half from dependencies (because it was never needed)
* sci-libs/miopen 5.1.3 and 5.7.1 should use -DHALF_INCLUDE_DIR

Other changes:
* Add myself to maintainers
* Change HOMEPAGE to https://github.com/ROCm/composable_kernel (because sourceforge code is not used)
* Rename ROCmSoftwarePlatform -> ROCm

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
…ch for 6.0.0

Other changes:
* update patch for gfx1012, repeating https://salsa.debian.org/rocm-team/rocm-hipamd/-/commit/76b378eb687133267874c045396b8cb671bb50f1
* update llvm eclass to r1
* add myself as a maintainer

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
* update llvm eclass to r1, allowing to specify LLVM version more precisely
* add compiler-rt to RDEPEND, as hipcc automatically links to libclang_rt.builtins-x86_64.a

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>

dev-util/hipcc: add myself as a maintainer

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* add support for LLVM 18

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes comparing to 5.7.1:
* rename RadeonOpenCompute -> ROCm in url
* patch annoying warnings
* add myself to maintainers
* fix installation of license file

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
This patch together with dev-util/hip and dev-libs/rocr-runtime patches allows to load
code object from fat binaries based on compatibility score for given ISA instead of full match.

Other changes:
* Rename RadeonOpenCompute -> ROCm
* Add myself to maintainers
* migrate llvm eclass to r1

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
…r gfx1012

This repeats change in https://salsa.debian.org/rocm-team/rocr-runtime/-/commit/da5ad99a9819f42c7c090f95bedf92529637afdc
by Cordell Bloor <cgmb@slerp.xyz>

Other changes:
* rename RadeonOpenCompute -> ROCm
* update llvm eclass to r1
* add myself to maintainers list

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
…ild failure

Closes: https://bugs.gentoo.org/927274
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* add llvm-18 compatibility patch (bug: ROCm/ROCm-Device-Libs#96)
* remove RESTRICT variable, it was shadowed and non-functional
* rename RadeonOpenCompute -> ROCm
* update llvm eclass to r1
* add comment about llvm 18 compatibility issue
* add myself to maintainers

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* add support for LLVM_COMPAT 18

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* fix tests
* make 6.1.0 compatible with LLVM 18

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* support LLVM 18

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Other changes:
* Add myself as maintainer

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* Dropped most of sed calls, not needed anymore
* Device access is not needed to configure
* Added myself as a maintainer

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Also add myself as a maintainer

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes since 5.7.2:
* Added patch for new issue ROCm/rocWMMA#360
* Disabled LTO due to llvm/llvm-project#61101

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* Can be built with GCC; hipcc is not needed directly here
* Rename ROCmSoftwarePlatform -> ROCm in URLs
* Add myself as a maintainer
* Rename ROCmSoftwarePlatform to ROCm in URLs
* Add myself as a maintainer

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* Benchmark tools were renamed from *-rider to *-bench
* Add myself to maintainers
* Drop sed fixes for install path

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* Can be built with gcc; hipcc is not needed directly
* No patches needed
* Added myself to maintainers

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* Access to device is not needed to configure and build
* Add myself to maintainers

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* drop all old patches, except for enable-test
* add new patch for Clang 17 compatibility (official build uses Clang 18)
* new dependency on dev-util/roctracer
* set >=dev-libs/half-1.12.0-r1 depencency to find half/half.hpp automatically
* add myself to maintainers
* add include path for dev-libs/half (works even then half.hpp is installed into /usr/include)

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* in updated expand-isa-compatibility patch do not coerce gfx1011 and gfx1012 to gfx1010, as Gentoo users can build rocBLAS for gfx1011 and gfx1012 with USE flags
* add myself to maintainers

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Changes:
* Drop configure fixes (not needed anymore)
* Add myself to maintainers

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
…2-2.3.0

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
…python 3.12+

Upstream bug: ROCm/rocminfo#69

Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
@gentoo-repo-qa-bot
Copy link
Collaborator

Pull request CI report

Report generated at: 2024-05-19 19:11 UTC
Newest commit scanned: 0fb6744
Status: ✅ good

There are existing issues already. Please look into the report to make sure none of them affect the packages in question:
https://qa-reports.gentoo.org/output/gentoo-ci/ffe09598ac/output.html

@Bratzmeister
Copy link

@heroxbd @AngryLoki can we please have review/merge? I eagerly await this <3 sadly rocm 5.7.1 makes my gpu crash often and it's about to be deprecated upstream for newer libraries (pytorch and friends)

@heroxbd
Copy link
Contributor

heroxbd commented Jun 7, 2024 via email

@AngryLoki
Copy link
Contributor Author

@Bratzmeister , I'd like to see #36509 merged first. I'll update my PR after that.

@Bratzmeister
Copy link

Bratzmeister @.***> writes:
@heroxbd @AngryLoki can we please have review/merge? I eagerly await this <3 sadly rocm 5.7.1 makes my gpu crash often and it's about to be deprecated upstream for newer libraries (pytorch and friends)
Thanks for reminding. Can you confirm that ROCm-6 solves your issue?

I currently run rocm6 as docker containers on t he same machine as 5.7 natively always results in crashes/hangs freezes but it's very bothersome as I have multiple use cases for pytorch etc. Maintaining all these containers is very annoying. I'm running a Radeon RX 7900 XT (gfx1100) btw. If I can help with something let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug linked Bug/Closes found in footer, and cross-linked with the PR. new package The PR is adding a new package.
Projects
None yet
6 participants